50 research outputs found

    A new SVD approach to optimal topic estimation

    Full text link
    In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a matrix. However, different rows and columns of the matrix are usually in very different scales and the connection between this matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. We overcome the challenges by introducing a proper Pre-SVD normalization of the text corpus matrix and a proper column-wise scaling for the matrix of interest, and by revealing a surprising Post-SVD low-dimensional {\it simplex} structure. The simplex structure, together with the Pre-SVD normalization and column-wise scaling, allows us to conveniently reconstruct the matrix of interest, and motivates a new SVD-based approach to learning topic models. We show that under the popular probabilistic topic model \citep{hofmann1999}, our method has a faster rate of convergence than existing methods in a wide variety of cases. In particular, for cases where documents are long or nn is much larger than pp, our method achieves the optimal rate. At the heart of the proofs is a tight element-wise bound on singular vectors of a multinomially distributed data matrix, which do not exist in literature and we have to derive by ourself. We have applied our method to two data sets, Associated Process (AP) and Statistics Literature Abstract (SLA), with encouraging results. In particular, there is a clear simplex structure associated with the SVD of the data matrices, which largely validates our discovery.Comment: 73 pages, 8 figures, 6 tables; considered two different VH algorithm, OVH and GVH, and provided theoretical analysis for each algorithm; re-organized upper bound theory part; added the subsection of comparing error rate with other existing methods; provided another improved version of error analysis through Bernstein inequality for martingale

    Covariate assisted screening and estimation

    Full text link
    Consider a linear model Y=Xβ+zY=X\beta+z, where X=Xn,pX=X_{n,p} and zN(0,In)z\sim N(0,I_n). The vector β\beta is unknown but is sparse in the sense that most of its coordinates are 00. The main interest is to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series (Fan and Yao [Nonlinear Time Series: Nonparametric and Parametric Methods (2003) Springer]) and the change-point problem (Bhattacharya [In Change-Point Problems (South Hadley, MA, 1992) (1994) 28-56 IMS]), we are primarily interested in the case where the Gram matrix G=XXG=X'X is nonsparse but sparsifiable by a finite order linear filter. We focus on the regime where signals are both rare and weak so that successful variable selection is very challenging but is still possible. We approach this problem by a new procedure called the covariate assisted screening and estimation (CASE). CASE first uses a linear filtering to reduce the original setting to a new regression model where the corresponding Gram (covariance) matrix is sparse. The new covariance matrix induces a sparse graph, which guides us to conduct multivariate screening without visiting all the submodels. By interacting with the signal sparsity, the graph enables us to decompose the original problem into many separated small-size subproblems (if only we know where they are!). Linear filtering also induces a so-called problem of information leakage, which can be overcome by the newly introduced patching technique. Together, these give rise to CASE, which is a two-stage screen and clean [Fan and Song Ann. Statist. 38 (2010) 3567-3604; Wasserman and Roeder Ann. Statist. 37 (2009) 2178-2201] procedure, where we first identify candidates of these submodels by patching and screening, and then re-examine each candidate to remove false positives.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1243 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    DISCUSSION: “A SIGNIFICANCE TEST FOR THE LASSO”

    Get PDF
    the stimulating paper, which provides insights into statistical inference based on the lasso solution path. The authors proposed novel covariance statistics for testing the significance of predictor variables as they enter the active set, which formalizes the data-adaptive test based on the lasso path. The observation that “shrinkage ” balances “adaptivity ” to yield to an asymptotic Exp(1) null distribution is inspiring, and the mathematical analysis is delicate and intriguing. Adopting the notation from the paper under discussion, the main results are that the covariance statistics (Theorem 1) (Tk0+1,Tk0+2,...,Tk0+d) d → ( Exp(1), Exp(1/2),...,Exp(1/d)) (1) for orthogonal designs, and under the global null model (Theorem 2), T1 Exp(1), and under the general model (Theorem 3), P(Tk0+1 ≥ t) ≤ exp(−t) + o(1). These remarkable results are derived under a number of critical assumptions such as the normality, the sure screening [borrowing the terminology of Fan and Lv (2008)] or model selection consistency of the lasso path. As pointed out in Fa
    corecore